1
DOI: 10.1201/9781003355205-1
C h a p t e r 1
Sequencing and
Raw Sequence Data
Quality Control
1.1 NUCLEIC ACIDS
Nucleic acids are the chemical molecules that every living organism must have. They carry
information that directs biological activities in cells and determines the inherited charac-
teristics of the living organism. The two main kinds of nucleic acids are deoxyribonucleic
acid (DNA) and ribonucleic acid (RNA). DNA is the master blueprint for life or the book of
life, and it constitutes the genetic material in prokaryotic and eukaryotic cells and virions.
The RNA is the main genetic material of the RNA viruses, but it is found in other organisms
as molecules transcribed by DNA to play important biological roles such as protein syn-
thesis and gene regulation. The set of the DNA particles in both prokaryotic and eukary-
otic cells is called the genome. RNA is the genome of only some viruses (RNA viruses).
A nucleic acid (DNA/RNA) is a polymer made up of four building blocks called nucleo-
tides. A molecule of the nucleotide consists of (i) a sugar molecule (either deoxyribose in
DNA or ribose in RNA) attached to a phosphate group and (ii) a nitrogen-containing base
called nucleobase. In general, the nucleic acid sequence is made up of four nucleotides dis-
tinguished from one another only by the nitrogen-containing bases (Adenine (A), Cytosine
(C), Guanine (G), and Thymine (T) in the DNA molecule and Adenine (A), Cytosine (C),
Guanine (G), and Uracil (U) in the RNA molecule). Those four nucleobases are divided
into pyrimidine and purine bases. Pyrimidine bases include cytosine, thymine, and uracil;
they are aromatic heterocyclic organic compound with a single ring. Purine bases include
adenine and guanine which have two heterocyclic ring structures. A DNA molecule exists
in the form of two complementary strands (forward and reverse) that wind around each
other forming a double-helix structure. The two strands are held together by hydrogen
bonds formed between the bases (adenine is a base pair of thymine (A/T), and cytosine is a